Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms
نویسندگان
چکیده
Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-ofthe-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems. In order to be eventually able to automatically verify such fault-tolerant distributed algorithms also for larger systems, several problems have to be addressed. In this paper, we consider modeling and verification of fault-tolerant algorithms that basically only contain threshold guards to control the flow of the algorithm. As threshold guards are widely used in fault-tolerant distributed algorithms (and also in Paxos), efficient methods to handle them bring us closer to the above mentioned goal. As a case study we use the reliable broadcasting algorithm by Srikanth and Toueg that tolerates even Byzantine faults. We show how one can model this basic fault-tolerant distributed algorithm in Promela such that safety and liveness properties can be efficiently verified in Spin. We provide experimental data also for other distributed algorithms.
منابع مشابه
Challenges in Model Checking of Fault-tolerant Designs in TLA
Although, historically, fault tolerance is connected to safetycritical systems, there has been an increasing interest in fault tolerance in mainstream application such as the cloud. There is a need for formal specification and verification of industrial fault-tolerant designs, since they integrate, in a non-trivial way, the ideas from distributed algorithms, whose correctness is usually based o...
متن کاملStarting a Dialog between Model Checking and Fault-tolerant Distributed Algorithms
Fault-tolerant distributed algorithms are central for building reliable spatially distributed systems. Unfortunately, the lack of a canonical precise framework for fault-tolerant algorithms is an obstacle for both verification and deployment. In this paper, we introduce a new domainspecific framework to capture the behavior of fault-tolerant distributed algorithms in an adequate and precise way...
متن کاملAccuracy of Message Counting Abstraction in Fault-Tolerant Distributed Algorithms
Fault-tolerant distributed algorithms are a vital part of mission-critical distributed systems. In principle, automatic verification can be used to ensure the absence of bugs in such algorithms. In practice however, model checking tools will only establish the correctness of distributed algorithms if message passing is encoded efficiently. In this paper, we consider abstractions suitable for ma...
متن کاملWhat You Always Wanted to Know About Model Checking of Fault-Tolerant Distributed Algorithms
Distributed algorithms have numerous mission-critical applications in embedded avionic and automotive systems, cloud computing, computer networks, hardware design, and the internet of things. Although distributed algorithms exhibit complex interactions with their computing environment and are difficult to understand for human engineers, computer science has developed only very limited tool supp...
متن کاملTutorial on Parameterized Model Checking of Fault-Tolerant Distributed Algorithms
Recently we introduced an abstraction method for parameterized model checking of threshold-based fault-tolerant distributed algorithms. We showed how to verify distributed algorithms without fixing the size of the system a priori. As is the case for many other published abstraction techniques, transferring the theory into a running tool is a challenge. It requires understanding of several verif...
متن کامل